Loading Datasets

Data preparation is the first step of the 7 step Rapid Process Troubleshooting methodology. Data is prepared to create a meaningful and effective dataset that can be used to model the process in step 3 of the methodology. At any stage during data preparation, you can load any dataset in your data recipe into the Troubleshooters, provided the datasets contain double fields. Simply select the dataset required, and click [Load dataset] from the panel at the bottom of the canvas.

From this panel, it is possible to perform the following actions:

Load dataset
Unload dataset
Switch datasets
Change categories
Remove models

Load dataset

When loading a dataset from your data recipe for further analysis in the Troubleshooter, the data is loaded into available system memory for fast access and analysis. It is therefore important to have sufficiently optimized the dataset to ensure that the dataset can be successfully loaded into available system memory. The size of the dataset is affected by both the number of fields and the number of records in the dataset. The Troubleshooter may display a warning if the dataset is estimated to be too large. Loading a dataset that is too large may affect the stability and performance of the Troubleshooter application and produce unexpected results.

Learn more about the data types that can be loaded into the Troubleshooters.

Configure timestamp:
Select to use one of the timestamp fields, or
Create a timestamp with a defined sampling period.
- The first timestamp will be set to Now.
- Data will be recorded at the defined sampling period.
All data will be shown, and depending on the number of data points, either the actual values will be shown, or the average value per bin will be shown.

Although data preparation operations can handle timestamps up to an accuracy of 100 nanoseconds, when loading data into the Troubleshooters, only millisecond values are supported. Further to this, the interval period between samples needs to be greater than, or equal to, 4 milliseconds.

Field categories: Categorize the process variables into targets, process disturbances, adjustable variables and process states. The aim of the step is to define the nature of each of the fields, and to identify which of the variables that influence the process outcome require troubleshooting. Highlight the field name, and click the relevant category button. All input fields should be classified in terms of the process using one of the following categories:
Targets

Classifies all fields defined as targets or expected outcomes in terms of the process. Target variables are those fields that require troubleshooting, analysis and correction.
Adjustables

Classifies all those controllable fields that are directly adjustable on the process and have a direct impact on the expected outcome or process target.
Disturbances

Fields that have an impact on the expected process outcome or target, but are not under the direct influence or control of any system or human interaction should be marked as disturbances.
States

Fields that can either be considered measured internal states of the process, or inputs to the process that are not directly controlled should be marked as states. They do not directly affect the target variables, but may be indicative of other variables, measurable or not, that do.
Undefined

Undefined fields have no specific classification.

NOTE: Although highly recommended, variable classification is not a required step and users can skip this part of the rapid process troubleshooting project if no categorization of fields are required by simply clicking on the Cancel button during variable classification.

Click [OK] when complete, and the data will be loaded into the Troubleshooter for further analysis. The step 2: Visualization view opens, and other troubleshooting steps become active.
Step 2: Visualization: When loading a dataset for the first time, the limits of acceptable values for each field are automatically calculated. They are disabled by default. To enable the limits, click on the limit text, and select [enable] from the drop down menu. When switching datasets, it is important to have the limits disabled, as the limits can have implications on the data quality.

Loading a second dataset

Only one dataset can be loaded at a time. Loading a second dataset while the first is still loaded has the same effect as unloading the first dataset, and loading the second. All project data, models and data actions configured will be lost.

Unload dataset

To remove the loaded dataset from the Troubleshooter, click [Unload data]. You will be asked to confirm this action.

On clicking yes, the following information will be affected:

- the dataset will be unloaded from the Troubleshooter, but will still remain within your data recipe.
- further troubleshooting steps will be disabled.
- the project data will be lost. This includes:
- - the configured timestamp field
  - the field categorization
  - any limits and lags set for the fields.
- models configured for the data will be lost.
- data actions such as Cause+ and SPC+ projects configured will be lost.

Switch

In order to investigate a system using the same troubleshooting process but with different data, it is possible to simply switch datasets.

Switching a dataset requires mapping of the fields between the datasets. These fields must be of the same data type, and there must be sufficient fields in the second dataset so that each field in the initial dataset is mapped to a field.

Switching datasets will retain the entire project configuration, including:

the field categorization
any limits set for the fields
any lags set for the fields
models configured for the data
configured data actions such as Cause+ and SPC+ projects

NOTE: By default, limits are calculated for the data range, and disabled. These limits include the high high, high, low, and low low value limits. If there is a big difference in the data range between the two datasets, be aware that when switching datasets, the new data could fall completely out of the limit range of the initial dataset. This will cause all the new data to be marked as bad quality. All limits are disabled as a default when loading a dataset for the first time to prevent this situation from occurring.

Remove models